Phrasal Queries with LingPipe and Lucene: Ad Hoc Genomics Text Retrieval
نویسنده
چکیده
The hypothesis we explored for the Ad Hoc task of the Genomics track for TREC 2004 was that phrase-level queries would increase precision over a baseline of token-level terms. We implemented our approach using two open source tools: the Apache Jakarta Lucene TF/IDF search engine (version 1.3) and the Alias-i LingPipe tokenizer and namedentity annotator (version 1.0.6). Contrary to our intuitions, the baseline system provided better performance in terms of recall and precision for almost every query at almost every precision/recall operating point.
منابع مشابه
Notes on Phrasal Indexing: JSCB Evaluation Experiments at NTCIR AD HOC
The evaluation experiments of the JSCB team are described with a focus on noun phrase indexing and its weighting issues in ad hoc text retrieval. Experiments on the effects of supplemental noun phrase indexing in view of the effect of various length of queries are reported. The results show that the noun phrase indexing outperforms single word only indexing with long queries while single word o...
متن کاملBengali and Hindi to English Cross-language Text Retrieval under Limited Resources
This paper describes our experiment on two cross-lingual and one monolingual English text retrievals at CLEF in the ad-hoc track. The cross-language task includes the retrieval of English documents in response to queries in two most widely spoken Indian languages, Hindi and Bengali. For our experiment, we had access to a HindiEnglish bilingual lexicon, ’Shabdanjali’, consisting of approx. 26K H...
متن کاملImproving English and Chinese Ad-Hoc Retrieval: TIPSTER Text Phase 3 Final Report
We investigated both English and Chinese ad-hoc information retrieval (IR). Part of our objectives is to study the use of term, phrasal and topical concept level evidence, either individually or in combination, to improve retrieval accuracy. For short queries, we studied five term level techniques that together lead to improvements over standard ad-hoc 2-stage retrieval some 20% to 40% for TREC...
متن کاملEnhancing access to the Bibliome: the TREC 2004 Genomics Track
BACKGROUND The goal of the TREC Genomics Track is to improve information retrieval in the area of genomics by creating test collections that will allow researchers to improve and better understand failures of their systems. The 2004 track included an ad hoc retrieval task, simulating use of a search engine to obtain documents about biomedical topics. This paper describes the Genomics Track of t...
متن کاملGenomic Information Retrieval Through Selective Extraction and Tagging by the ASU-BioAL Group
In this paper we describe the approach used by the Arizona State University BioAI group for the ad-hoc retrieval task of the TREC Genomics Track 2005. We pre-process TREC query expression by adding the synonyms of genes, diseases, bio-processes, functions of organs, and selectively adding stemming verbs, nouns, and Mesh Heading categories. The pre-processed queries are used to perform initial s...
متن کامل